Spatiotemporal Patterns of Influenza Activity in the United States

BMIN503/EPID600 Final Project

Author

Grace Li

Published

November 22, 2023


Overview

Season influenza viruses continues to be a major global burden. There are around a billion cases annually and up to 650,000 people die of influenza. Influenza activity patterns vary geographically and temporally, so we want to examine the spatiotemporal patterns of influenza activity in the United States and determine if variations in influenza activities across states or regions are associated with distribution of circulating virus strains. I will be using surveillance data published by the Centers for Disease Control and Prevention (CDC) and viral sequences from the Global Initiative on Sharing All Influenza Data (GIDSAID) to answer these questions. Understanding patterns of seasonal influenza activity can help guide early influenza surveillance and inform influenza prevention and control programs. Materials for this project can be found in this GitHub repository.

Introduction

Influenza has a large seasonal burden in the United States: up to 35 million people are infected annually causing between 12,000 and 56,000 deaths. In the U.S., flu season typically starts on the 40th week of the year (1st week of October), but influenza activity including time of onset, duration, and number of peaks varies greatly across the states.

The CDC routinely publishes weekly reports of influenza surveillance data providing a national picture of influenza virus activities in the U.S. One aspect of these surveillance is performed by clinical laboratories across the country determining the percentage of specimens positive for influenza virus. These test results allow us to evaluate if influenza cases are increasing or decreasing. Efforts by public health laboratories to sequence influenza genomes have also allowed researchers to study sequences of influenza viruses using databases like GISAID. By linking regional epidemiologic data to circulating influenza sequencing data, we hope to identify patterns of clade distribution associated with influenza activities in the U.S. An understanding of the spatial and temporal spread of influenza can help inform public health decisions in influenza prevention and control.

Methods

Data source

Download and clean data

# Load WHO/NREVSS surveillance data from clinical labs
# Retrieved from https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html
clinical_labs <- read_csv("WHO_NREVSS_Clinical_Labs.csv")
# Rename variables and order week variable 
clinical_labs <- clinical_labs |>
        dplyr::select(-c("REGION TYPE")) |>
        dplyr::rename(state = REGION, 
               year = YEAR,
               week = WEEK,
               total_specimens = `TOTAL SPECIMENS`, 
               total_A = `TOTAL A`,
               total_B = `TOTAL B`,
               percent_positive = `PERCENT POSITIVE`,
               percent_A = `PERCENT A`,
               percent_B = `PERCENT B`) |> 
        mutate(ordered_week = (week - 40) %% 52 + 1) |>
        mutate(season = ifelse(week >= 40, year, year - 1)) |>
        mutate(ordered_week = as.factor(ordered_week)) |>
        mutate(percent_positive = as.numeric(percent_positive))
# Retrieve state information and division using tigirs() and the package datasets 
states <- states()

# Obtain state information on names and division 
states_division <- data.frame(NAME = datasets::state.name, division = datasets::state.division)

states <- inner_join(states, states_division, by = "NAME") |>
            rename(state = NAME)
# Include state division info in clinical_labs df  
clinical_labs <- inner_join(states, clinical_labs, by = "state")

Results

# Percent positive cases in each division in 2017-18 season 
season2017 <- clinical_labs %>%
  filter(season == "2017")

# Percent positive cases in 2017-18 season 
season2017 |> 
  ggplot(aes(x = ordered_week, y = percent_positive, fill = division)) +
        facet_wrap(~ state, nrow = 5, ncol = 10) +
        geom_bar(stat = "identity") +
        scale_fill_brewer(palette = "Spectral") +
        scale_x_discrete(breaks = 1:52, labels = c(40:52, 1:39)) +
        scale_y_continuous() +
        labs(x = "Week", y = "Percent Positive", title = "2017-18 Influenza Positive Tests Reported to CDC") +
        theme_classic() +
        theme(legend.position = "none")

# Get unique divisions
unique_divisions <- unique(season2017$division)

# Create a list to store individual plots
season2017_plot_list <- list()

# Create plots for each division
for (i in unique_divisions) {
        
  division_data <- season2017 |> 
          filter(division == i)
  
  p <- ggplot(division_data, aes(x = ordered_week, y = percent_positive, fill = state)) +
    geom_bar(stat = "identity") +
    facet_wrap(~ state, scales = "free_y") +
    scale_fill_brewer(palette = "Spectral") +
    scale_x_discrete(breaks = 1:52, labels = c(40:52, 1:39)) +
    scale_y_continuous() +
    ylim(0, 66) +
    labs(x = "Week", y = "Percent Positive", title = i) +
    theme_classic() +
    theme(legend.position = "none")
 season2017_plot_list[[length(season2017_plot_list) + 1]] <- p
}

print(season2017_plot_list)
[[1]]


[[2]]


[[3]]


[[4]]


[[5]]


[[6]]


[[7]]


[[8]]


[[9]]

# grid.arrange(grobs = season2017_plot_list, ncol = 3) 

# Percent positive cases in 2017-18 season in Middle Atlantic
season2017 |>
        filter(division == "Middle Atlantic") |>
        ggplot(aes(x = ordered_week, y = percent_positive, fill = state)) +
        geom_bar(stat = "identity") +
        facet_wrap(~ state) +
        scale_fill_brewer(palette = "Spectral") +
        scale_x_discrete(breaks = 1:52, labels = c(40:52, 1:39)) +
        scale_y_continuous() +
        labs(x = "Week", y = "Percent Positive", title = "2017-18 Influenza Positive Tests Reported to CDC") +
        theme_classic() +
        theme(legend.position = "none")

# Percent positive cases in 2017-18 season in Pennsylvania
season2017 |>
        filter(state == "Pennsylvania") |>
        ggplot(aes(x = ordered_week, y = percent_positive, fill = division)) +
        geom_bar(stat = "identity") +
        scale_fill_brewer(palette = "Spectral") +
        scale_x_discrete(breaks = 1:52, labels = c(40:52, 1:39)) +
        scale_y_continuous() +
        labs(x = "Week", y = "Percent Positive", title = "2017-18 Pennsylvania Influenza Positive Tests Reported to CDC") +
        theme_classic() +
        theme(legend.position = "none")

# Percent positive cases in 2018-19 season 
clinical_labs |> filter(season == "2018") |>
        ggplot(aes(x = ordered_week, y = percent_positive, fill = division)) +
        facet_wrap(~ state, nrow = 5, ncol = 10) +
        geom_bar(stat = "identity") +
        scale_fill_brewer(palette = "Spectral") +
        scale_x_discrete(breaks = 1:52, labels = c(40:52, 1:39)) +
        scale_y_continuous() +
        labs(x = "Week", y = "Percent Positive", title = "2018-19 Influenza Positive Tests Reported to CDC") +
        theme_classic() +
        theme(legend.position = "none")

#Map of 48 states
# states |>
#   filter(!STATEFP %in% c("02", "15")) |> #remove Alaska and Hawaii
#   ggplot() +
#   geom_sf() +
#   theme_void()

continuous_states_clinical_labs <- clinical_labs |>
  filter(!STATEFP %in% c("02", "15"))  #remove Alaska and Hawaii

my_theme <- function() {
  theme_minimal() +                                  
  theme(axis.line = element_blank(),                 
        axis.text = element_blank(),                 
        axis.title = element_blank(),
        panel.grid = element_line(color = "white"),  
        legend.key.size = unit(0.4, "cm"),          
        legend.text = element_text(size = 8),       
        legend.title = element_text(size = 10),
        plot.title = element_text(size = 18))      
}

myPalette <- colorRampPalette(brewer.pal(9, "YlOrRd"))

continuous_season2017 <- continuous_states_clinical_labs |>
  filter(season == "2017")


# Graphing map of week 40   
st_simplify(continuous_season2017) |> 
  filter(week == "40") |>
  ggplot() +
  geom_sf(aes(fill = percent_positive), dTolerance = 1e3) + 
  my_theme() +
  scale_fill_gradientn(name = "% Positive", colours = myPalette(100),
                             limit = range(0, 70))

# Get unique divisions
unique_week <- unique(continuous_season2017$week)

# Create a list to store individual plots
season2017_map_list <- list()

# Create plots for each division
for (i in unique_week) {
        
  week_data <- continuous_season2017 |> 
          filter(week == i)
 
 weekly_map <- st_simplify(week_data) |> 
  ggplot() +
  geom_sf(aes(fill = percent_positive), dTolerance = 1e3) + 
  my_theme() +
  scale_fill_gradientn(name = "% Positive", colours = myPalette(100),
                             limit = range(0, 70))
  season2017_map_list[[length(season2017_map_list) + 1]] <- weekly_map

}

#print(season2017_map_list)

# Set up animation options
ani.options(interval = 1, outdir = getwd(), ani.width = 800, ani.height = 600)

# Create animation
saveGIF({
  for (i in seq_along(season2017_map_list)) {
    print(season2017_map_list[[i]])
  }
}, movie.name = "2017_positive.gif", interval = 0.5, ani.width = 800, ani.height = 600)
[1] TRUE
knitr::include_graphics("2017_positive.gif")

# Load CDC data
state_ili_case <- read_csv("StateDatabySeason.csv")

state_ili_case <- state_ili_case |>
        select(-c(URL, WEBSITE)) |>
        rename(state = STATENAME, 
               level = `ACTIVITY LEVEL`, 
               label = `ACTIVITY LEVEL LABEL`,
               weekend = WEEKEND,
               week = WEEK,
               season = SEASON) |> 
        mutate(level = as.numeric(gsub("Level ", "", level))) |>
        mutate(ordered_week = (week - 40) %% 52 + 1) 

# ILI activity in 2018-19 
state_ili_case |> filter(season == "2018-19") |>
        ggplot(aes(x = ordered_week, y = level, color = state)) +
        facet_wrap(~ division) +
        geom_point() +
        geom_line() +
        # scale_color_brewer(palette = "Set1") +
        scale_x_continuous(breaks = 1:52, labels = c(40:52, 1:39)) +
        scale_y_continuous(breaks = 1:13) +
        labs(x = "Week", y = "ILI Activity Level", title = "2018-19 ILI Activity") +
        theme_bw()

# ILI activity in 2018-19; set colors

color <- brewer.pal(11, "Spectral")
color <- colorRampPalette(color)(50)

state_ili_case |> filter(season == "2018-19") |> 
        # group_by(division) |>
        ggplot(aes(x = ordered_week, y = level, color = state)) +
        facet_wrap(. ~ division) +
        geom_point() +
        geom_line() +
        scale_color_manual(values = color) +
        scale_x_continuous(breaks = 1:52, labels = c(40:52, 1:39)) +
        labs(x = "Week", y = "ILI Activity Level", title = "2018-19 ILI Activity") +
        theme_bw()
pa_ili_case <- state_ili_case |>
        filter(state == "Pennsylvania")

# ILI activity in Pennsylvania (all seasons)
pa_ili_case |> ggplot(aes(x = ordered_week, y = level, color = season)) +
        geom_point() +
        geom_line() +
        scale_x_continuous(breaks = 1:52, labels = c(40:52, 1:39)) +
        labs(x = "Week", y = "ILI Activity Level", title = "title") +
        theme_classic()


# ILI activity in Pennsylvania (select seasons)
pa_ili_case |> filter(season %in% c("2017-18", "2018-19", "2019-20")) |>
  ggplot(aes(x = ordered_week, y = level, color = season)) +
        geom_point() +
        geom_line() +
        scale_x_continuous(breaks = 1:52, labels = c(40:52, 1:39)) +
        scale_y_continuous(breaks = 1:13) +
        labs(x = "Week", y = "ILI Activity Level", title = "Pennsylvania ILI Activity") +
        theme_classic()


# test2 --> crazy graphs 
state_ili_case |> ggplot(aes(x = ordered_week, y = level)) +
        geom_point() +
        geom_line() +
        facet_grid(season ~ division) +
        scale_x_continuous(breaks = 1:52, labels = c(40:52, 1:39)) +
        scale_y_continuous(breaks = 1:13) +
        labs(x = "Week", y = "ILI Activity Level", title = "Pennsylvania ILI Activity") +
        theme_classic()
        

# ILI activity in Pennsylvania in 2018-2019 
pa_ili_case |> filter(season == "2018-19") |>
        ggplot(aes(x = ordered_week, y = level)) +
        geom_point() +
        geom_line() +
        scale_x_continuous(breaks = 1:52, labels = c(40:52, 1:39)) +
        scale_y_continuous(breaks = 1:13) +
        labs(x = "Week", y = "ILI Activity Level", title = "2018-19 ILI Activity") +
        theme_classic()

Describe your results and include relevant tables, plots, and code/comments used to obtain them. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you’d like, but this is not required.

References